Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
PeerJ ; 12: e16624, 2024.
Article in English | MEDLINE | ID: mdl-38188165

ABSTRACT

The Open Tree of Life (OToL) project produces a supertree that summarizes phylogenetic knowledge from tree estimates published in the primary literature. The supertree construction algorithm iteratively calls Aho's Build algorithm thousands of times in order to assess the compatability of different phylogenetic groupings. We describe an incrementalized version of the Build algorithm that is able to share work between successive calls to Build. We provide details that allow a programmer to implement the incremental algorithm BuildInc, including pseudo-code and a description of data structures. We assess the effect of BuildInc on our supertree algorithm by analyzing simulated data and by analyzing a supertree problem taken from the OpenTree 13.4 synthesis tree. We find that BuildInc provides up to 550-fold speedup for our supertree algorithm.


Subject(s)
Algorithms , Knowledge , Phylogeny
2.
Malar J ; 23(1): 27, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38238806

ABSTRACT

BACKGROUND: Though Plasmodium vivax is the second most common malaria species to infect humans, it has not traditionally been considered a major human health concern in central Africa given the high prevalence of the human Duffy-negative phenotype that is believed to prevent infection. Increasing reports of asymptomatic and symptomatic infections in Duffy-negative individuals throughout Africa raise the possibility that P. vivax is evolving to evade host resistance, but there are few parasite samples with genomic data available from this part of the world. METHODS: Whole genome sequencing of one new P. vivax isolate from the Democratic Republic of the Congo (DRC) was performed and used in population genomics analyses to assess how this central African isolate fits into the global context of this species. RESULTS: Plasmodium vivax from DRC is similar to other African populations and is not closely related to the non-human primate parasite P. vivax-like. Evidence is found for a duplication of the gene PvDBP and a single copy of PvDBP2. CONCLUSION: These results suggest an endemic P. vivax population is present in central Africa. Intentional sampling of P. vivax across Africa would further contextualize this sample within African P. vivax diversity and shed light on the mechanisms of infection in Duffy negative individuals. These results are limited by the uncertainty of how representative this single sample is of the larger population of P. vivax in central Africa.


Subject(s)
Malaria, Vivax , Malaria , Animals , Humans , Plasmodium vivax/genetics , Malaria, Vivax/parasitology , Africa, Central , Genomics , Duffy Blood-Group System/genetics
3.
Proc Natl Acad Sci U S A ; 119(34): e2204435119, 2022 08 23.
Article in English | MEDLINE | ID: mdl-35972964

ABSTRACT

To assess the conventional treatment in evolutionary inference of alignment gaps as missing data, we propose a simple nonparametric test of the null hypothesis that the locations of alignment gaps are independent of the nucleotide substitution or amino acid replacement process. When we apply the test to 1,390 protein alignments that are informed by protein tertiary structure and use a 5% significance level, the null hypothesis of independence between amino acid replacement and gap location is rejected for ∼65% of datasets. Via simulations that include substitution and insertion-deletion, we show that the test performs well with true alignments. When we simulate according to the null hypothesis and then apply the test to optimal alignments that are inferred by each of four widely used software packages, the null hypothesis is rejected too frequently. Via further simulations and analyses, we show that the overly frequent rejections of the null hypothesis are not solely due to weaknesses of widely used software for finding optimal alignments. Instead, our evidence suggests that optimal alignments are unrepresentative of true alignments and that biased evolutionary inferences may result from relying upon individual optimal alignments.


Subject(s)
Amino Acids , Nucleotides , Proteins , Algorithms , Amino Acid Substitution , Amino Acids/genetics , Nucleotides/genetics , Proteins/genetics , Sequence Alignment , Software
5.
Bioinformatics ; 37(18): 3032-3034, 2021 09 29.
Article in English | MEDLINE | ID: mdl-33677478

ABSTRACT

SUMMARY: We describe improvements to BAli-Phy, a Markov chain Monte Carlo (MCMC) program that jointly estimates phylogeny, alignment and other parameters from unaligned sequence data. Version 3 is substantially faster for large trees, and implements covarion models, additional codon models and other new models. It implements ancestral state reconstruction, allows prior selection for all model parameters, and can also analyze multiple genes simultaneously. AVAILABILITY AND IMPLEMENTATION: Software is available for download at http://www.bali-phy.org. C++ source code is freely available on Github under the GPL2 License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Software , Phylogeny , Indonesia , Markov Chains , Monte Carlo Method
6.
PeerJ ; 5: e3058, 2017.
Article in English | MEDLINE | ID: mdl-28265520

ABSTRACT

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project's "synthetic tree" starting at version 5. This software pipeline is called "propinquity". It relies heavily on "otcetera"-a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.

7.
Genetics ; 201(3): 1171-88, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26374460

ABSTRACT

We present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about reproduction under pure hermaphroditism, gynodioecy, and a model developed to describe the self-fertilizing killifish Kryptolebias marmoratus. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens sampling formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet process prior model. Our sampler is designed to accommodate additional information, including observations pertaining to the sex ratio, the intensity of inbreeding depression, and other aspects of reproduction. It can provide joint posterior distributions for the population-wide proportion of uniparental individuals, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual. Further, estimation of all basic parameters of a given model permits estimation of functions of those parameters, including the proportion of the gene pool contributed by each sex and relative effective numbers.


Subject(s)
Models, Biological , Mutation , Self-Fertilization , Algorithms , Animals , Bayes Theorem , Biological Evolution , Caryophyllaceae , Computer Simulation , Data Accuracy , Female , Fundulidae , Male , Microsatellite Repeats
8.
Evolution ; 66(1): 135-46, 2012 Jan.
Article in English | MEDLINE | ID: mdl-22220870

ABSTRACT

Currently available phylogenetic methods for studying the rate of evolution in a continuously valued character assume that the rate is constant throughout the tree or that it changes along specific branches according to an a priori hypothesis of rate variation provided by the user. Herein, we describe a new method for studying evolutionary rate variation in continuously valued characters given an estimate of the phylogenetic history of the species in our study. According to this method, we propose no specific prior hypothesis for how the variation in evolutionary rate is structured throughout the history of the species in our study. Instead, we use a bayesian Markov Chain Monte Carlo approach to estimate evolutionary rates and the shift point between rates on the tree. We do this by simultaneously sampling rates and shift points in proportion to their posterior probability, and then collapsing the posterior sample into an estimate of the parameters of interest. We use simulation to show that the method is quite successful at identifying the phylogenetic position of a shift in the rate of evolution, and that estimated rates are asymptotically unbiased. We also provide an empirical example of the method using data for Anolis lizards.


Subject(s)
Biological Evolution , Lizards/genetics , Models, Genetic , Phenotype , Animals , Computer Simulation , Phylogeny
9.
Mycologia ; 103(2): 361-78, 2011.
Article in English | MEDLINE | ID: mdl-21139031

ABSTRACT

The Caloplaca saxicola group is the main group of saxicolous, lobed-effigurate species within genus Caloplaca (Teloschistaceae, lichen-forming Ascomycota). A recent monographic revision by the first author detected a wide range of morphological variation. To confront the phenotypically based circumscription of these taxa and to resolve their relationships morphological and ITS rDNA data were obtained for 56 individuals representing eight Caloplaca species belonging to the C. saxicola group. We tested the monophyly of these eight morphospecies by performing maximum parsimony, maximum likelihood and two different types of Bayesian analyses (with and without a priori alignments). Restricting phylogenetic analyses to unambiguously aligned portions of ITS was sufficient to resolve, with high bootstrap support, five of the eight previously recognized species within the C. saxicola group. However, phylogenetic resolution of all or most of the eight species currently included as two distinct subgroups within the C. saxicola group was possible only by combining morphological characters and signal from ambiguously aligned regions with the unambiguously aligned ITS sites or when the entire ITS1 and 2 regions were not aligned a priori and included as an integral component of a Bayesian analysis (BAli-Phy). The C. arnoldii subgroup includes C. arnoldii, comprising four subspecies, and the C. saxicola subgroup encompasses seven species. Contrary to the C. saxicola subgroup, monophyly of taxa included within the C. arnoldii subgroup and their relationships could not be resolved with combined ITS and morphological data. Unequivocal morphological synapomorphies for all species except C. arnoldii and C. pusilla are recognized and presented.


Subject(s)
Ascomycota/classification , Phylogeny , Ascomycota/genetics , Ascomycota/growth & development , Bayes Theorem , DNA, Fungal/genetics , DNA, Ribosomal/genetics , Molecular Sequence Data
10.
Philos Trans R Soc Lond B Biol Sci ; 363(1512): 3931-9, 2008 Dec 27.
Article in English | MEDLINE | ID: mdl-18852105

ABSTRACT

Models of molecular evolution tend to be overly simplistic caricatures of biology that are prone to assigning high probabilities to biologically implausible DNA or protein sequences. Here, we explore how to construct time-reversible evolutionary models that yield stationary distributions of sequences that match given target distributions. By adopting comparatively realistic target distributions,evolutionary models can be improved. Instead of focusing on estimating parameters, we concentrate on the population genetic implications of these models. Specifically, we obtain estimates of the product of effective population size and relative fitness difference of alleles. The approach is illustrated with two applications to protein-coding DNA. In the first, a codon-based evolutionary model yields a stationary distribution of sequences, which, when the sequences are translated,matches a variable-length Markov model trained on human proteins. In the second, we introduce an insertion-deletion model that describes selectively neutral evolutionary changes to DNA. We then show how to modify the neutral model so that its stationary distribution at the amino acid level can match a profile hidden Markov model, such as the one associated with the Pfam database.


Subject(s)
DNA/genetics , Evolution, Molecular , Genetics, Population , Models, Genetic , Proteins/genetics , Computer Simulation , INDEL Mutation/genetics , Markov Chains , Population Density , Selection, Genetic
11.
BMC Evol Biol ; 7: 40, 2007 Mar 14.
Article in English | MEDLINE | ID: mdl-17359539

ABSTRACT

BACKGROUND: Phylogenies of rapidly evolving pathogens can be difficult to resolve because of the small number of substitutions that accumulate in the short times since divergence. To improve resolution of such phylogenies we propose using insertion and deletion (indel) information in addition to substitution information. We accomplish this through joint estimation of alignment and phylogeny in a Bayesian framework, drawing inference using Markov chain Monte Carlo. Joint estimation of alignment and phylogeny sidesteps biases that stem from conditioning on a single alignment by taking into account the ensemble of near-optimal alignments. RESULTS: We introduce a novel Markov chain transition kernel that improves computational efficiency by proposing non-local topology rearrangements and by block sampling alignment and topology parameters. In addition, we extend our previous indel model to increase biological realism by placing indels preferentially on longer branches. We demonstrate the ability of indel information to increase phylogenetic resolution in examples drawn from within-host viral sequence samples. We also demonstrate the importance of taking alignment uncertainty into account when using such information. Finally, we show that codon-based substitution models can significantly affect alignment quality and phylogenetic inference by unrealistically forcing indels to begin and end between codons. CONCLUSION: These results indicate that indel information can improve phylogenetic resolution of recently diverged pathogens and that alignment uncertainty should be considered in such analyses.


Subject(s)
Frameshift Mutation , Genes, Viral , HIV-1/genetics , Models, Genetic , Phylogeny , Simian Immunodeficiency Virus/genetics , Base Sequence , Bayes Theorem , Codon, Terminator , Markov Chains , Sequence Alignment
12.
Bioinformatics ; 22(16): 2047-8, 2006 Aug 15.
Article in English | MEDLINE | ID: mdl-16679334

ABSTRACT

SUMMARY: BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies. AVAILABILITY: Software is available for download at http://www.biomath.ucla.edu/msuchard/bali-phy.


Subject(s)
Bayes Theorem , Computational Biology/methods , Algorithms , Amino Acid Sequence , Internet , Markov Chains , Molecular Sequence Data , Monte Carlo Method , Phylogeny , Programming Languages , Sequence Alignment , Software
13.
Syst Biol ; 54(3): 401-18, 2005 Jun.
Article in English | MEDLINE | ID: mdl-16012107

ABSTRACT

We describe a novel model and algorithm for simultaneously estimating multiple molecular sequence alignments and the phylogenetic trees that relate the sequences. Unlike current techniques that base phylogeny estimates on a single estimate of the alignment, we take alignment uncertainty into account by considering all possible alignments. Furthermore, because the alignment and phylogeny are constructed simultaneously, a guide tree is not needed. This sidesteps the problem in which alignments created by progressive alignment are biased toward the guide tree used to generate them. Joint estimation also allows us to model rate variation between sites when estimating the alignment and to use the evidence in shared insertion/deletions (indels) to group sister taxa in the phylogeny. Our indel model makes use of affine gap penalties and considers indels of multiple letters. We make the simplifying assumption that the indel process is identical on all branches. As a result, the probability of a gap is independent of branch length. We use a Markov chain Monte Carlo (MCMC) method to sample from the posterior of the joint model, estimating the most probable alignment and tree and their support simultaneously. We describe a new MCMC transition kernel that improves our algorithm's mixing efficiency, allowing the MCMC chains to converge even when started from arbitrary alignments. Our software implementation can estimate alignment uncertainty and we describe a method for summarizing this uncertainty in a single plot.


Subject(s)
Algorithms , Bayes Theorem , Classification/methods , Models, Genetic , Phylogeny , Sequence Alignment/methods , Amino Acid Sequence , Base Sequence , Computer Simulation , Markov Chains , Molecular Sequence Data , Monte Carlo Method , RNA, Ribosomal, 5S/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...